- What is a (data) plot?
- What are the three most important data plots?
2016-06-28
How would you describe this plot?
Elements of a plot
Additional components
Variable in the data is directly mapped to an element in the plot
glimpse(autism) # Observations: 604 # Variables: 7 # $ childid (int) 1, 1, 1, 1, 1, 10, 10, 10, 10, 100, 100, 100, 100, 10... # $ sicdegp (fctr) high, high, high, high, high, low, low, low, low, hi... # $ age2 (dbl) 0, 1, 3, 7, 11, 0, 1, 7, 11, 0, 1, 3, 7, 0, 1, 7, 11,... # $ vsae (int) 6, 7, 18, 25, 27, 9, 11, 18, 39, 15, 24, 37, 135, 8, ... # $ gender (fctr) male, male, male, male, male, male, male, male, male... # $ race (fctr) white, white, white, white, white, white, white, whi... # $ bestest2 (fctr) pdd, pdd, pdd, pdd, pdd, autism, autism, autism, aut...
ggplot(autism, aes(x=age2, y=vsae)) + geom_point()
How is the data mapped to graphical elements?
ggplot(autism, aes(x=age2, y=vsae)) + geom_jitter()
How is the data mapped to graphical elements?
ggplot(autism, aes(x=age2, y=vsae)) + geom_point() + geom_line()
ggplot(autism, aes(x=age2, y=vsae, group=childid)) + geom_point() + geom_line()
ggplot(autism, aes(x=age2, y=vsae, group=childid)) + geom_point() + geom_line(alpha=0.5)
ggplot(autism, aes(x=age2, y=vsae, group=childid)) + geom_point() + geom_line(alpha=0.5) + scale_y_log10()
ggplot(autism, aes(x=age2, y=vsae, group=childid, colour=bestest2)) + geom_point() + geom_line(alpha=0.5) + scale_y_log10()
ggplot(autism, aes(x=age2, y=vsae, colour=bestest2)) + geom_point(alpha=0.1) + geom_line(aes(group=childid), alpha=0.1) + geom_smooth(se=F) + scale_y_log10()
What do we learn about autism, age, and the diagnosis at age 2?
How is the data mapped to graphical elements?
That's not what I wanted ….
41% Of Fliers Think You’re Rude If You Recline Your Seat
# Observations: 1,040 # Variables: 27 # $ RespondentID (dbl) ... # $ How often do you travel by plane? (chr) ... # $ Do you ever recline your seat when you fly? (chr) ... # $ How tall are you? (int) ... # $ Do you have any children under 18? (chr) ... # $ In a row of three seats, who should get to use the two arm rests? (chr) ... # $ In a row of two seats, who should get to use the middle arm rest? (chr) ... # $ Who should have control over the window shade? (chr) ... # $ Is itrude to move to an unsold seat on a plane? (chr) ... # $ Generally speaking, is it rude to say more than a few words tothe stranger sitting next to you on a plane? (chr) ... # $ On a 6 hour flight from NYC to LA, how many times is it acceptable to get up if you're not in an aisle seat? (chr) ... # $ Under normal circumstances, does a person who reclines their seat during a flight have any obligation to the person sitting behind them? (chr) ... # $ Is itrude to recline your seat on a plane? (chr) ... # $ Given the opportunity, would you eliminate the possibility of reclining seats on planes entirely? (chr) ... # $ Is it rude to ask someone to switch seats with you in order to be closer to friends? (chr) ... # $ Is itrude to ask someone to switch seats with you in order to be closer to family? (chr) ... # $ Is it rude to wake a passenger up if you are trying to go to the bathroom? (chr) ... # $ Is itrude to wake a passenger up if you are trying to walk around? (chr) ... # $ In general, is itrude to bring a baby on a plane? (chr) ... # $ In general, is it rude to knowingly bring unruly children on a plane? (chr) ... # $ Have you ever used personal electronics during take off or landing in violation of a flight attendant's direction? (chr) ... # $ Have you ever smoked a cigarette in an airplane bathroom when it was against the rules? (chr) ... # $ Gender (chr) ... # $ Age (chr) ... # $ Household Income (chr) ... # $ Education (chr) ... # $ Location (Census Region) (chr) ...
Mix of categorical and quantiative variables. What mappings are appropriate? Area for counts of categories, side-by-side boxplots for mixed pair.
ggplot(fly, aes(x=`How often do you travel by plane?`)) + geom_bar() + coord_flip()
Categories are not sorted
fly$`How often do you travel by plane?` <-
factor(fly$`How often do you travel by plane?`, levels=c(
"Never","Once a year or less","Once a month or less",
"A few times per month","A few times per week","Every day"))
ggplot(fly, aes(x=`How often do you travel by plane?`)) + geom_bar() + coord_flip()
fly_sub <- fly %>% filter(`How often do you travel by plane?` %in%
c("Once a year or less","Once a month or less")) %>%
filter(!is.na(`Do you ever recline your seat when you fly?`)) %>%
filter(!is.na(Age)) %>% filter(!is.na(Gender))
fly_sub$`Do you ever recline your seat when you fly?` <- factor(
fly_sub$`Do you ever recline your seat when you fly?`, levels=c(
"Never","Once in a while","About half the time",
"Usually","Always"))
ggplot(fly_sub, aes(y=`How tall are you?`, x=`Do you ever recline your seat when you fly?`)) + geom_boxplot() + coord_flip()
Take a look at the ggplot2 Cheat sheet
How many geoms are available in ggplot2? What is geom_rug?
What is the difference between colour and fill?
What does coord_fixed() do? What is the difference between this and using theme(aspect.ratio=...)?
What are scales? How many numeric transformation scales are there?
What are position adjustments? When would they be used?
Use your cheat sheet to work out how to make plot to explore the relationship between
Do you ever recline your seat when you fly? and Is it rude to recline your seat on a plane?
ggplot(fly_sub, aes(x=`In general, is itrude to bring a baby on a plane?`)) + geom_bar() + coord_flip() + facet_wrap(~Gender)
fly_sub$Age <- factor(fly_sub$Age, levels=c("18-29","30-44","45-60","> 60"))
ggplot(fly_sub, aes(x=`In general, is itrude to bring a baby on a plane?`)) +
geom_bar() + coord_flip() + facet_grid(Age~Gender)
p <- ggplot(fly_sub, aes(x=`In general, is itrude to bring a baby on a plane?`,
fill=Gender)) +
geom_bar(position="fill") + coord_flip() + facet_wrap(~Age, ncol=5)
p
What do we learn?
p + scale_fill_brewer(palette="Dark2")
library(scales)
library(dichromat)
clrs <- hue_pal()(3)
p + theme(legend.position = "none")
clrs <- dichromat(hue_pal()(3))
p + scale_fill_manual("", values=clrs) + theme(legend.position = "none")
Can you find the odd one out?
Is it easier now?
ggplot(fly_sub, aes(x=`In general, is itrude to bring a baby on a plane?`,
fill=Gender)) +
geom_bar(position="fill") + coord_flip() + facet_wrap(~Age, ncol=5)
With this arrangement we can see proportion of gender within each rudeness category, and compare these across age groups. How could we arrange this differently?
ggplot(fly_sub, aes(x=Gender,
fill=`In general, is itrude to bring a baby on a plane?`)) +
geom_bar(position="fill") + coord_flip() + facet_wrap(~Age, ncol=5) + theme(legend.position="bottom")
What is different about the comparison now?
ggplot(fly_sub, aes(x=Age,
fill=`In general, is itrude to bring a baby on a plane?`)) +
geom_bar(position="fill") + coord_flip() + facet_wrap(~Gender, ncol=5) +
theme(legend.position="bottom")
The ggthemes package has many different styles for the plots. Other packages such as xkcd, skittles, wes anderson, beyonce, ….
library(xkcd)
ggplot(fly_sub, aes(x=Gender,
fill=`In general, is itrude to bring a baby on a plane?`)) +
geom_bar(position="fill") + coord_flip() + facet_wrap(~Age, ncol=5) +
theme_xkcd() + theme(legend.position="bottom")
See the vignette for instructions on installing the xkcd font.
Compile the rmarkdown document that you have put together thus far in the workshop!
This work is licensed under the Creative Commons Attribution-Noncommercial 3.0 United States License. To view a copy of this license, visit http://creativecommons.org/licenses/by-nc/ 3.0/us/ or send a letter to Creative Commons, 171 Second Street, Suite 300, San Francisco, California, 94105, USA.